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Various implementations of the Toffoli gate up to a relative phase have been known for years. 

The advantage over regular Toffoli gate is their smaller circuit size. However, their use has been 
often limited to a demonstration of quantum control in designs such as those where the Toffoli gate 
is being applied last or otherwise for some specific reasons the relative phase does not matter. It 
was commonly believed that the relative phase deviations would prevent the relative phase Toffolis 
from being very helpful in practical large-scale designs. 

In this paper, we report three circuit identities that provide the means for replacing certain 
configurations of the multiple control Toffoli gates with their simpler relative phase implementations, 
up to a selectable unitary on certain qubits, and without changing the overall functionality. We 
illustrate the advantage via applying those identities to the optimization of the known circuits 
implementing multiple control Toffoli gates, and report the reductions in the CNOT-count, T- 
count, as well as the number of ancillae used. We suggest that a further study of the relative phase 
Toffoli implementations and their use may yield other optimizations. 

PACS numbers: 03.67.Lx, 03.67.Ac 


I. INTRODUCTION 

Multiple control Toffoli gates are the staple of quan¬ 
tum arithmetic and reversible circuits. They are em¬ 
ployed widely within quantum algorithms, including in 
reversible transformations, such as arithmetic circuits 
and all sorts of Boolean operations over quantum regis¬ 
ters, as well as subroutines within other specialized quan¬ 
tum transforms. Unfortunately, multiple control Toffoli 
gates are not simple operations, and require to be im¬ 
plemented using a certain library of elementary gates— 
physically attainable transformations for physical-level 
implementations, and fault-tolerant gates on the logical 
level. As of the time of this writing, most advanced and 
developed trapped ions [I| and superconducting Q quan¬ 
tum information processing approaches allow computa¬ 
tions over at most a few dozen qubits using at most a few 
dozen two-qubit gates. The smallest of the multiple con¬ 
trol Toffoli gates, the three-qubit Toffoli gate, requires 
six CNOT gates as a physical-level circuit over control¬ 
ling apparatus allowing the application of the CNOT and 
arbitrary single qubit gates, and seven T gates, as a logi¬ 
cal fault-tolerant circuit over Clifford-|-T library without 
ancillae. The known implementations of larger multi¬ 
ple control Toffoli gates come at a substantially higher 
cost. This makes the multiple control Toffoli gates be 
expensive computing primitives. As such, the ability to 
replace them with their simpler counterparts that never¬ 
theless can guarantee the overall functional integrity, as 
well as their optimization (multiple control Toffoli gates 
are implemented using smaller size multiple control Tof- 
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foils [3,1^) are important in practice. Ultimately, the dif¬ 
ficulty of implementing Toffoli gates may even be a decid¬ 
ing factor in the ability to run an experiment of a desired 
size. Indeed, consider a scenario where only a fixed num¬ 
ber of certain elementary gates can be applied. Imagine 
the goal is to run a discrete logarithm type computation 
[^. Since circuits implementing such an algorithm are 
dominated by reversible arithmetic operations, which in 
turn rely on the Toffoli gates, it is conceivable that op¬ 
timizing Toffoli implementations would yield a resource 
count that is possible to execute for a desired size com¬ 
putation. Multiple control Toffoli gates are, of course, 
important beyond just the discrete logarithm type algo¬ 
rithms. 


The goal of this paper is to provide a framework for 
replacing multiple control Toffoli gates with their simpler 
relative phase implementations. The advantage is illus¬ 
trated through an optimization of the implementations 
of the multiple control Toffoli gates. The reported opti¬ 
mization is viewed as a motivating example rather than 
a complete and finished study. An in-depth look at the 
implementations of the relative phase multiple control 
Toffoli gates and their use in the optimization of arbi¬ 
trary quantum circuits may likely yield more results. 


To draw a classical analogy, relative phase Toffoli gates 
may turn out to play a role analogous to the classical 
NAND gates: while classical (quantum/reversible) cir¬ 
cuits are designed using a convenient for a human set of 
operations (multiple control Toffolis), a compiler may de¬ 
compose those into NAND gates (relative phase multiple 
control Toffolis) before they are mapped into lowest-level 
transistors (elementary quantum gates). 
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II. DEFINITIONS 


this paper. For an in-depth review we refer the reader to 


In this paper, we will work with pure n-qubit quan- 

2"-l 

turn states ^ Oii\i) and quantum transformations de- 

i=0 

scribed by the 2" x 2” unitary matrices U. Recall that 
a square matrix U is called unitary if its inverse equals 
to its conjugate transpose, U~^ = W. While the prop¬ 
erty of unitarity defines evolutions that are possible to 
attain physically, it does not prescribe which ones may 
be implemented directly. To assist with the presentation 
of the material, we will discretize the family of transfor¬ 
mations that may be obtained physically, and call them 
elementary quantum gates. This does not limit the ap¬ 
plicability of the results—indeed, discrete circuits may 
be thought of as certain versions of continuous Hamilto¬ 
nians, but are otherwise easier to work with. In particu¬ 
lar, in this work we will rely on the following elementary 

gates: Pauli-X, X = NOT = Pauli-Z, Z = 

Q ^ roots Phase, P = \fZ = 

T = ^ = and Pauli-r, Y = 

A fourth root of Y will be mentioned in some construc¬ 
tions, in the form of i?v(7r/4), that is equivalent to the 
fourth root of Y up to a global phase. Recall that 

Ry(9) = f y Finally, for completeness 

we will need the Hadamard gate, H = -^ 

and the two-qubit CNOT gate that we introduce via 
the mapping of kets, rather than the 4x4 matrix, as 
CNOT(a, 6) : \a,b) |a,5 0a), and everywhere else by 

linearity, due to the simplicity of such a definition. 

Quantum circuits are defined as the strings of quantum 
gates, or otherwise products of matrices that correspond 
to the individual gates. For multiple qubit circuit compu¬ 
tations via matrices, a proper Kronecker product needs 
to be taken to compute matrix products. For example, a 
two-qubit operation corresponding to the Hadamard gate 
on the first qubit is given by the matrix H®Id, where Id 
is the identity applied to the second qubit. Recall that 
the product of matrices is taken in reverse order with re¬ 
spect to the order of gates in the corresponding circuit. 
Following the standard notations, circuits/unitaries com¬ 
posed of quantum gates/matrices X, Y, Z, P, H, and 
CNOT are called Clifford. These unitaries play an impor¬ 
tant role in quantum error correction, but are not com¬ 
plete (moreover, simulable classically with a polynomial 
size effort) for quantum computation. As such, for com¬ 
pleteness, a circuit library needs to contain a non-Clifford 
gate, such as the T gate. The addition of any non-Clifford 
gate to the Clifford circuits furthermore turns out to re¬ 
sult in the computational universality Q. 

The above is meant to be a quick reminder of some 
basic facts and an introduction of the notations used in 




For convenience, we furthermore use the following no¬ 
tations: for a set of variables/qubits X = {xl^X 2 ^ .■■, Xn}, 
|A| equals n, being the number of individual qubits in 
this set, and the conjugation (Boolean AND) of variables, 
is denoted as simply x. When the number 
of variables in the set X is zero, we assign x the value 
of 1. When the set of variables X consists of a single 
element, {x}, the conjugation of the variables within the 
set, as well as the name of the variable, coincide; this 
does not however cause any issues. 

We next define the multiple control Toffoli gates. 

Definition 1. A multiple control Toffoli gate over a set 
of n qubits with the set X = {xi,X 2 , ...,x„_i} being the 
controls, and qubit y being the target, TOF^{X;y), is 
defined as the matrix 

diag |l, I,...., I, 

We will sometimes omit the superscript and write 
TOF(X;y) when the controls and the target are explic¬ 
itly specified and the size of the multiple control Toffoli 
gate can thus be restored. Similarly, we may omit the 
specification of the qubits the gate operates on and write 
TOF" when we are only concerned with the size of the 
gate. Finally, we may write TOF when the goal is to 
specify the kind of gate being the Toffoli and distinguish 
it from other kinds of gates. Observe, that when |A| = 0 
the above definition reports the Pauli-A (NOT) gate, 
for |A| = 1 the definition introduces the CNOT gate, 
when |A| = 2, it reduces to the usual Toffoli gate TOF^, 
and for larger sets X, the multiply-controlled Toffolis— 
Toffoli-4, Toffoli-5, etc. 

An alternate definition of the multiple control Toffoli 
gate may cast it in the form of the mapping of kets, 
as follows, TOF‘^{X;y) : \X;y) >->• \X,y(Bx). In some 
cases, the mapping of kets may be easier to operate with 
than the corresponding unitary matrix. 

In our constructions, relative phase implementations 
of quantum unitary transformations play a major role. 
For the purpose of this work, we define relative phase 
implementations as follows. 

Definition 2. A relative phase version of a quantum 
u-qubit unitary operation U = {Mi,j}|i,j=o.. 2 "-i is any 
u-qubit unitary V = {uqj}|i_j=o.. 2 "-i such that \vij\ = 
\uij I for all i and j. 

In other words, a relative phase version or otherwise 
implementation of a unitary C/ is a unitary V such that 
the elements of the two matrices differ by where (j) € 
M, and (j) may be different for different matrix elements. 
Observe that = 0, therefore relative phase versions 
of unitaries have zeroes everywhere the original unitary 
does. 

To illustrate, a relative phase multiple control Toffoli 
gate over the set of controls X = {xi,X 2 , ...,x„_i} with 
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the target y, RTOF{X;y), can be written as follows, 


diag< Zq, zi,Z2«-3, 


0 

22"-1 


22"-2 

0 


where Zi are arbitrary length-1 complex numbers. Pre¬ 
fix “i?” is used to distinguish the relative phase version 
from the multiple control Toffoli gate itself. Observe that 
when all Zi = 1, the respective relative phase Toffoli gate 
RTOF{X;y) becomes the multiple control Toffoli gate 
TOF{X ; y), and when all Zi take the same but fixed value 
2 , the respective relative phase Toffoli gate RTOF{X;y) 
implements the multiple control Toffoli gate TOF{X;y) 
up to an undetectable global phase 2 . 

A relative phase multiple control Toffoli gate RTOF^ 
may be thought of as a product of the multiple con¬ 
trol Toffoli gate TOF^ and an n-qubit diagonal uni¬ 
tary Indeed, for a diagonal unitary := 

(im( 7 { 2 o, 2 i,...., 22 "-i} circuit TOF'^D'^ implements a 
generic relative phase multiple control Toffoli gate 

RiTOF^ = diag\zo,Zi,....,Z2«-3j 


diagl zq, 2i,...., 22"_3, 


Observe how 


22"-1 0 

whereas circuit D^TOF^ implements a generic rela¬ 
tive phase multiple control Toffoli gate R 2 TOF'^ = 

0 22"-1 
22"-2 0 

both gates are relative phase multiple control Toffoli 
gates, but different in the last two non-zero elements, 
that are being permuted. We will exploit this property 
in the circuit diagrams. In particular, of the two possi¬ 
ble decompositions of the relative phase multiple control 
Toffoli gate into a product of the multiple control Toffoli 
and a diagonal unitary, we will select TOF"‘D'^ to be 
the canonic one, and draw the respective relative phase 
multiple control Toffoli gate with same controls as the di¬ 
agonal gate D" and a distorted target, such as illustrated 
in Figure [Ijc). The helpful intuition behind this picto¬ 
rial representation is as follows: a Toffoli gate TOF{X\y) 
may be combined with a diagonal gate D{Z), Z G {X, y}, 
following it to obtain a relative phase Toffoli gate, or a 
Toffoli gate TOF{X;y) may be combined with a diag¬ 
onal gate D{Z), Z G {X,y}, preceding it to obtain the 
inverse of a relative phase Toffoli gate; conversely, each 
relative phase Toffoli gate or its inverse may be broken 
down into a suitable pair of the multiple control Toffoli 
gate and the diagonal gate. 

An important property of the relative phase mul¬ 
tiple control Toffoli gates is that every one of those 
is an inverse of some other relative phase multi¬ 
ple control Toffoli gate. Indeed, for RiTOF^ = 

0 22"-1 
22"-2 0 

R 2 T 0 F^ = diag {'Wo,wi,....,W2r^-3, ^ 


diagl zq, Zi,...., 22"-3, 


and 


W2"-l 
W2"-2 0 


RiTOF^ = R^^TOF^ when Wi = z,"" for i = 0...2” - 3, 

W 2"-2 = and W 2 ^-i = ^^n^_ 2 - 

We next define special form relative phase multiple 
control Toffoli gates, that are important in some of the 
constructions that follow. 


Definition 3. For a set Ai = {xi, 2 : 2 ,..., a;„}, and 
its subset X' = a type-X' spe¬ 

cial form relative phase multiple control Toffoli gate, 
RTOF{xi,X 2 , ■■■,Xn-i',Xn), is defined as the matrix 


diag 


|zo,Zi,...,Z2n-3, 


0 Z2"-2 

Z2"-l 0 


where every pair of complex numbers Zg and Zt are equal 
whenever the binary expansions of s and t are different 
only in the digits ii,i 2 , ■■■, ik-i, and ik- 

To illustrate, a type-{xi} 

RTOF{xi, X 2 , ■■■, Xn-i', Xn) is given by the ma¬ 
trix 


diag{zo,zi,...,Z2^-i-i, 

f 0 Z2^^-l_2 \ 1 

Z0’Zl,--.Z2"-i-3, 0 y' 

The type-{xi} special form relative phase Toffoli gate 
S^^RTOF has half the number of the degrees of free¬ 
dom compared to the equal size unrestricted relative 
phase Toffoli gate RTOF. In practice, this suggests 
that it should be easier to find an efficient circuit im¬ 
plementing a relative phase Toffoli gate than it is to 
find one of the same size for a type-{xi} special form 
relative phase Toffoli gate. To give another example, 
a type-AT RTOF{xi,X 2 , ■■■,Xn-i',Xn) is the most re¬ 
strictive of the kind. It is equal to the respective Tof¬ 
foli gate up to a global phase, and thereby does not 
give much freedom in implementing by a circuit over the 
TOF(xi,X 2 , ■■■, Xn-i; Xn)- This means that in the prac¬ 
tical constructions, and whenever possible, we will try to 
use a type-Ai' special form relative phase multiple control 
Toffoli gate with the smallest size set X'. 

An alternate and equivalent definition of a 
type-X' special form relative phase Toffoli gate 
is via a transformation given by the circuit 
TOF{xi,X 2, ■■■,Xn-i',Xn)D{X \ X'). It furthermore 
serves as a basis for how we draw SRTOF gates in 
the circuit diagrams. Compared to the multiple control 
Toffoli gate, every control/target in the set X \ X' of 
RTOF appears distorted by the dingbat originating 
from the respective D{X\X'), and every control/target 
in the set X' appears undistorted, see Figure [Hd). 

Beyond having fewer degrees of freedom compared to 
an unrestricted relative phase Toffoli gate, there is one 
more important difference between the special form rel¬ 
ative phase Toffoli gates and the relative phase Toffoli 
gates: the inverse of a type-X' special form relative phase 
Toffoli gate is not always a type-X' special form relative 
phase Toffoli gate. 

The use of subscripts allows to distinguish different 
versions of the relative phase and special form relative 
phase multiple control Toffoli gates. For instance, nota¬ 
tions RiTOF and R 2 TOF indicate that both gates are 
some relative phase Toffoli gates, but they are not nec¬ 
essarily related. In contrast, an Rf^TOF is the inverse 
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l^> 


(a) 




l^> + 9 - 


\y) 

(d) 



FIG. 1: (a) a multiple control Toffoli gate TOF{X;y), (b) a diagonal gate Di{X;y) and its inverse; observe how 
different diagonal gates can be visually distinguished by the number within the control, and a diagonal gate and its 
inverse are related by the different color of the control, (c) a relative phase multiple control Toffoli gate 
RiTOF{X\y) and its inverse Ri^TOF{X\y), (d) a type-y special form RiTOF{X]y), and its inverse, and (e) a 
controlled-unitary U implemented up to some relative phase. The -f- symbol denotes a multiqubit register. 


of the RiTOF. Recall, that a circuit implementing the 
inverse operation may be constructed by conjugating the 
gates in the circuit implementing the given unitary and 
inverting their order. Observe further that any two TOF 
gates of the same size are represented by identical ma¬ 
trices; this is not always true for some two RTOF or a 
pair of SRTOF, therefore the ability to distinguish dif¬ 
ferent versions of the relative phase implementations is 
important, as these could be different gates. 

We will draw quantum gates and circuits using stan¬ 
dard notations, including the relative phase gates per di¬ 
agrams found in Figure [U with time propagating from 
left to right. Some useful circuit identities clarifying and 
summarizing the above discussions are shown next. 




show canonic decomposition of S'^’^RiTOF and 
RxTOF into the product of the multiple con¬ 
trol Toffoli gate TOF and the diagonal gate Di\ 
read right-to-left, these rules show how to combine 
a suitable pair of the multiple control Toffoli gate 
and the diagonal gate into a (special form) relative 
phase Toffoli gate. When Y = 0, second circuit 
illustrates the R\TOF gate. 


2 . 


1 ^) 

in 



1 ^) 





+ 




show canonic decomposition of S'^'^Ri^TOF and 
S'^ Ri^TOF into the product of the diagonal gate 
and the multiple control Toffoli gate. Indeed, look¬ 
ing at the second of the two identities. 


SyR-^TOF{X,Y-,z) 
= {TOF{X,Y;z)D,{X,z))-^ 
= D-\X,z)TOF{X,Y-,z), 


being the circuit pictured on the right hand side. 


3. VO 3®: 



in other words, every RiTOF is also an i ?2 ^TOF 
under the proper choice of relative phases. 


In general, for any reversible gate R{X) its rela¬ 
tive phase version could be thought of as a product 
R{X)D{X), for a proper diagonal unitary D{X). This 
suggests a possible route in which the work reported in 
this paper may be extended. 


III. MAIN RESULT 


Our main result is summarized in the next three Propo¬ 
sitions. We apply it to obtain multiple Corollaries, and 
to optimize multiple control Toffoli gates in the section 
that follows. The proofs of these three propositions rely 
on the three circuit identities concluding previous section, 
as well as the following notion: the controlled-C/ imple¬ 
mented up to a relative phase, RCU (U, W; X), commutes 
with the controlled-U implemented up to a relative phase, 
RCV{V,Y] Z), where the qubit sets V,W,X,Y, and Z 
are disjoint. This rule also applies to show that any two 
non-intersecting unitaries commute. We assume reader’s 
familiarity with the above commutation rule, and do not 
explicitly prove it here. 

Proposition I. The conjugation of the controlled unitary 
U over the qubit set Z implemented up to a possible rel¬ 
ative phase, RiCU (F, a; Z), by a pair of multiple control 
Toffoli gates TOF{X]a) allows the replacement of these 
multiple control Toffoli gates with their relative phase 
versions implemented up to any desired unitary V{X), 
such as illustrated next: 
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Proof: The proof is accomplished via the following set any R 2 TOF{Y, Z; a) (a type-T special form relative 
of circuit transformations: phase Toffoli gate): 


TOF{X-, a)RiCU{Y, a; Z)TOF{X-a) 
= TOF{X; a)D 2 {X; a)D 2 \X; a) 
RiCU{Y,a; Z)TOF{X;a) 
= TOF{X-a)D2{X; a)RiCUiY, a; Z) 
D2\X-a)TOF{X;a) 
= R2T0F{X-, a)RiCUiY, a; Z)R^^TOF{X; a) 
= R2T0F{X; a)V{X)V-^{X)RiCUiY, a; Z) 

R2^TOF{X;a) 
= [R 2 TOF{X;a)V{X)]RiCU{Y,a; Z) 
[V-HX)R^^TOF{X-,a)]. 


□ 

The result of Proposition [T] can be reduced to the fol¬ 
lowing form once RCU(Y, a; Z) is set to implement the 
Toffoli type gate, TOF{Y,a\ z): 


\X) 


\Y) 

| 0 ) 

1 ^) 


V 




f/t —®-r 






1 ^) 

in 

| 0 ) 

jz 0 xy) 


Indeed, the corresponding circuit on the left hand side 
in © computes 


I V o \ TOF(X;0) TOF(Y,x-z) 

|X,y,0,z) \X,Y,x,z) ^ 

\X, Y,x,z® xy) ^x, y, 0, 2 0 xy), 


which is indicated by the formulas on the output side. 
This, in turn, leads to the following corollary. 

Corollary 1. An n-qubit Toffoli gate TOF'^ can be im¬ 
plemented with the cost not exceeding the sum of twice 
the cost of an n-qubit relative phase Toffoli gate RTOF"^ 
and the cost of the CNOT gate, using one ancilla qubit 
set to and returned in the value |0). In other words, in 
the presence of such an ancilla. 


Cost (TO A”) < 2 X Cost[RTOF^) + Cost {CNOT). 


This corollary may be reformulated for a different 
choice of the middle gate, e. 3 ., as follows: Cost{TOF^) < 
2 X Cost{RTOF^-^) + Cost{TOF^). 

Other gate configurations are also supported by the 
relative phase Toffolis. The following Proposition com¬ 
plements the set of basic rules we base the proposed op¬ 
timization approach on. 

Proposition 2. Consider the conjugation of a controlled- 
U gate RiCU{W-,X,Y) implemented possibly up to 
some relative phase, by a pair of identical multiple control 
Toffoli gates, such as illustrated in ([2|) on the left hand 
side. Then, the following circuit identity holds for any 
unitary transformation V over the qubit set {Z U a} and 



Proof: This proposition may be proved similarly to 
Proposition [H 

TOF{Y, Z; a)RiCU{W-, X, Y)TOF{Y, Z; a) 

= TOF{Y, Z; a)D 2 {Z- a)D^\Z-, a) 
RiCU{W; X, Y)TOF{Y, Z; a) 

= TOF{Y, Z; a)D2{Z-a)RiCU{W-, X, Y) 
D:^\Z-,a)TOF{Y,Z-,a) 

= S^R2T0F{Y, Z; a)RiCU{W-, X, Y) 
S^R2^TOF{Y,Z;a) 

= S^R2T0F{Y, Z; a)y(Z; a)V-^{Z-, a) 
RiCU{W-, X, Y)S^ R^^TOFiY, Z; a) 

= [S'^R2T0F{Y, Z; a)y (Z; a)] RiCU{W-, X, Y) 
[l/-i(Z; a)S^R 2 ^TOF{Y, Z; a)] 

An alternate proof may be constructed via restricting 
W, X, Y, and Z to contain at most a single qubit each, 
and multiplying the corresponding matrices . The ben¬ 
efit of considering such a matrix multiplication is in the 
ability to show that the R 2 TOF{Y, Z; a) turns out 
to be the relative phase Toffoli gate that allows most 
freedom in selecting relative phases for a general unitary 
U, allowing to formulate this proposition as an “if-and- 
only-if” statement. Furthermore, looking at the matrices 
helps to expand the set of possible allowed relative phase 
replacements once U is known. □ 

The results of Propositions [T] and [2 may be generalized 
via introducing a control set P that controls all three 
gates on the left hand side and well as all five gates on 
the right hand side, and a control set Q that controls all 
gates except V. 

Observe, that between the two Propositions they cover 
all situations when a relative phase controlled-?/ is conju¬ 
gated by a pair of multiple control Toffoli gates such that 
the targets of those multiple control Toffoli gates do not 
intersect with the U, resulting in the ability to replace a 
pair of multiple control Toffoli gates with a pair of simpler 
gates. A similar circuit identity may be developed for the 
scenario when the target of the multiple control Toffolis 
intersects with the qubits used by the unitary U. This 
circuit identity relies on the special form relative phase 
Toffoli gates. We have not yet found practical examples 
where such circuit identity would yield an advantage and 
the results of Propositions [T] and [2] do not apply, but for¬ 
mulate the statement of the respective Proposition for 
completeness. 
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Proposition 3. The conjugation of the controlled unitary 
U implemented up to a relative phase, RiCU{X; Y, Z, a), 
by a pair of the multiple control Toffoli gates 
TOF{W,Z\a) allows the replacement of these multiple 
control Toffoli gates with the type-jZ U a} special form 
relative phase version (up to a multiplication by any de¬ 
sired unitary V{W)) and its inverse, as follows: 



We do not include an explicit proof, but mention that 
it may be obtained similarly to that of Propositions [1] 
and m Furthermore, we note that the scenario where 
RiCU{X-,Y,Z,a) is a diagonal gate, e.g., a controlled- 
Rz implemented up to a possible relative phase, is better 
handled by applying Proposition [T] than Proposition [3] 
{e.g., see item 21 Subsection IIII All . Indeed, Proposition 
[U uses the most generic unspecified type relative phase 
Toffoli, and any controlled-may be thought of as a 
targetless gate {\Z\ = 0 in the statement of Proposition 
[T]) or otherwise, one may introduce a new target qubit 
that applies a global phase [ 1 , Figure 4.5]. 


A. Applications 

The principal circuit equalities o and © suggest a 
circuit optimization procedure by which a suitable pair 
of the multiple control Toffoli gates can be replaced with 
their relative phase or special form relative phase imple¬ 
mentations up to the right hand multiplication by any 
desired unitary over the proper qubit set. The rules may 
be used interchangeably and combined. In particular, we 
next illustrate how the above approach can be applied 
to optimize the most popular constructs used to imple¬ 
ment/decompose the multiple control Toffoli gates into 
simpler gates. In the following discussions, we will omit 
unitaries V, with the understanding that if needs be, they 
may be added back in. 

Corollary 2. [Optimization of the construction reported 
in [1, Lemma 7.2].] A multiple control Toffoli gate TOF^ 
can be implemented by a circuit consisting of 4n — 14 rel¬ 
ative phase Toffoli gates RTOF^ and a type-{y} special 
form relative phase Toffoli gate RTOF^{x,y; z) and 
its inverse over a circuit with at least 2n — 3 qubits, such 
as illustrated in Figure 21 

Proof: The numeric order of subscripts in the special 
form and relative phase Toffoli gates indicates the order 
in which the circuit equalities @ and ([T|) are applied to 
the original circuit reported in [3|, Lemma 7.2] to obtain 
the desired simplified decomposition. Observe that when 


during this process a pair of Toffoli gates TOF^{a,b;c) 
is replaced with a special form or a relative phase imple¬ 
mentation, the circuit in the middle may be equivalent 
to a combination of a suitable multiple control Toffoli 
gate—possibly up to a relative phase, and a transforma¬ 
tion on the qubits outside the set {a,b,c}. This latter 
transformation may be factored out, thereby allowing all 
circuit alternations to retain the original functional cor¬ 
rectness. 

Finally, observe that the identities m and m may 
be used in a number of different ways, resulting in dif¬ 
ferent constructions, and not just the particular one 
selected in the statement of the Corollary. In Figure 
m we used one of such constructions that minimizes 
the number of the special form relative phase Toffoli 
gates to gain most freedom in substituting Toffoli gates 
with their relative phase implementations. In Figure 
©c) we furthermore restricted the number of potentially 
different RTOF gates via making the following assign¬ 
ments: RiTOF := R3TOF, R5TOF := R-^TOF, and 
RqTOF := R^^TOF. This implementation will be used 
later in the paper. □ 


Corollary 3. [Optimization of the construction reported 
in [1, Lemma 7.3].] A multiple control Toffoli gate TOF^ 
can be implemented by a circuit consisting of two relative 
phase Toffoli gates RTOF^ and two special form relative 
phase Toffoli gates SRTOF'^~^^^ over a circuit with at 
least n -b 1 qubits, such as illustrated next: 




Proof: To obtain this construction, both circuit iden¬ 
tities (El) and (I2|) need to be applied once, in any order. 
□ 


Corollary 4. [Optimization of the construction in a page 
184].] A multiple control gate C^U can be implemented 
by a circuit consisting of 2n — 2 relative phase Toffoli 
gates RTOF^ and one CU gate over a circuit with at 
least 2n qubits of which some n — 1 qubits are set to and 
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FIG. 2: (a-b) Implementation of TOF^ on the qubits 1 — 9 using RTOF^ and its inverse, and 10 RTOF^ gates, 
(a-c) Implementation of TOF^ using a narrow selection of the relative phase and special form relative phase Toffoli 

gates. 


returned in the value |0), such as illustrated next: 


Some other optimizations include the following. 


| 0 ) 

| 0 ) 

| 0 ) 

| 0 ) 


u 



Proof: The circuit identity H]) is applied n — 1 times. 

□ 

The implementation in [ 3 , equation (13)] optimizes the 
depth of the circuit [ 1 , page 184], but does not prevent 
our construction from being applied. We formalize this 
observation in the following Corollary. 

Corollary 5. (Optimization/generalization of the con¬ 
struction in [3, equation (13)].] A multiple control gate 
C"[/ can be implemented by a circuit consisting of 2n —2 
relative phase Toffoli gates RTOF^ and one CU gate 
over a circuit with at least 2n qubits of which some n — 1 
qubits are set to and returned in the value jO), such as 
illustrated next: 



1. Circuit in [3, Lemma 7.5] may rely on the sim¬ 
pler relative phase multiple control Toffoli gate and 
its inverse, rather than two multiple control Toffoli 
gates (gates #2 and #4 on the right hand side). 

2. Circuit in [3, Lemma 7.9] may rely on the simpler 
special form relative phase multiple control Toffoli 
gate and its inverse, rather than two multiple con¬ 
trol Toffoli gates (gates #2 and #4 on the right 
hand side). 

3. Circuit in [3, Lemma 7.11] may rely on the sim¬ 
pler relative phase multiple control Toffoli gate and 
its inverse, rather than two multiple control Toffoli 
gates (gates #1 and #3 on the right hand side). 

4. Circuit in [^, Figure 3] may rely on the simpler 
relative phase Toffoli gates and their inverses, as is 
best seen via applying Proposition [T] 


IV. OPTIMIZING IMPLEMENTATIONS OF 
THE MULTIPLE CONTROL TOFFOLI GATES 
USING THE EXISTING RELATIVE PHASE 
TOFFOLI CIRCUITS 


In this section we study in detail how to optimize the 
implementations of the multiple control Toffoli gates, 
show that all of the known optimized implementations 
can be explained by the means of the relative phase Tof¬ 
foli substitutions described in this work, and report some 
new optimized circuits. 


A. Circuit cost 

The question of the efficiency of implementing a certain 
transformation requires one to formally define a circuit 
cost. Depending on the definition of cost, certain circuits 
will be preferred over other circuits. 


































































































































There are a number of different definitions of the cir¬ 
cuit cost used in the literature, each originating from 
considering certain specific requirements. At the highest 
abstraction level, hrstly, one needs to determine if they 
are dealing with logical level or physical level circuits. 

In the former case, one has to derive the protocols 
and compute the costs of the constructible fault-tolerant 
gates, given the selected approach to error correction. 
Within this framework Clifford-|-T circuits received a sig- 
nihcant attention. This is because Clifford gates such 
as Pauli-X, Y, Z, Hadamard, Phase, and CNOT are 
believed to be relatively inexpensive to implement fault 
tolerantly on the logical level. The non-Clifford gate T, 
or any other constructible non-Clifford gate required for 
computational universality, is more difficult to generate. 
The known approaches employ state purification and gate 
teleportation as a means of generating the T gate, that 
can get quite costly in the realistic systems Q. As a re¬ 
sult, the cost of the implementation of a logical circuit 
can be very crudely approximated by the number of the 
T gates used. 

In the case of physical level circuits, one is limited to 
the ability of the controlling apparatus to apply transfor¬ 
mations to the physical quantum information processing 
system of choice. There is a great variety of the pos¬ 
sibilities here. We consider a simple and popular weak 
interaction model, where the single-qubit gates can be 
implemented efficiently, and of the two-qubit gates, that 
take considerably more effort to implement, we have just 
the CNOT gate. The cost of the circuits can thus be eval¬ 
uated via counting the number of the CNOT gates in the 
single-qubit and CNOT gate circuits. Despite apparent 
oversimplihcation, there is a specific promising quantum 
information processing approach, where exactly this for¬ 
mula describes the circuit cost at a high abstraction level. 
Indeed, trapped ions with Molmer-Sorenson gate i op¬ 
erate in the weak coupling regime (two-qubit gates take 
roughly 10 — 20 fold effort to implement compared to 
arbitrary single-qubit gates), and Molmer-Sorenson gate 
itself is equivalent to the CNOT up to a conjugation by 
a pair of Rz{a) and Rz{—a) gates on both qubits, for 
a proper choice of parameter a, and a few single-qubit 
Phase and Hadamard gates. 

An advantage of measuring the cost of the circuit im¬ 
plementations by the T-count and the CNOT-count is 
due to the popularity of these circuit cost metrics in 
the literature, and the ability to compare relative phase 
inspired implementations developed in this work to the 
known ones. 

Disadvantages of using either one of these two circuit 
cost metrics are numerous. Neither circuit metric ac¬ 
counts for: 

• the depth, that could be more important than the 
gate count, especially when one is, quite naturally, 
concerned with the speed of the computation given 
by a quantum circuit rather than just its size; 

• the connectivity pattern of the qubits. Indeed, 



FIG. 3: Toffoli gate implemented up to a relative phase: 
gates 1-10 implement a type-{c} special relative phase 
Toffoli gate, known as the controlled-controlled-iAi in 
0, whereas circuit with gates 2-10 implements some 
generic relative phase Toffoli gate. The controlled-^ 
gate CZ{a\ c) may commute through the Hadamard 
H{c), at which point it will change into CNOT [a] c), 
and the circuit will show in an alternate form. It may 
be established, via applying the result of Corollary [U 
that the CNOT count of the circuit with gates 2-10 is 
optimal. 


physical space spans only three dimensions, and ev¬ 
ery qubit cannot be connected to every other qubit 
in a scalable fashion within a hnite-dimensional 
space; or 

• the number of ancillary qubits used, that is partic¬ 
ularly important on the physical level. The num¬ 
ber of ancillary qubits used also influences the effi¬ 
ciency of connections between primary qubits. This 
is because both primary qubits and ancillary qubits 
share same physical space and yet need to be as 
close to each other as possible for higher efficiency. 

These are all very important practical considerations. 
However, our goal is to demonstrate the advantages of 
the framework introduced in this paper for designing ef¬ 
ficient circuits, therefore we restrict the attention to the 
above two simplistic metrics. We furthermore encourage 
to apply the techniques from this paper to designing ef¬ 
ficient circuits in the scenario where the details of the 
circuit cost function are known. 


B. Toffoli and Toffoli-4 gates up to a relative phase 


Firstly, recall a circuit implementing the Toffoli gate 
TOF{a, b] c) itself: 



I_I 


This circuit may be drawn in many different ways using 
no more than the minimal numbers of 6 CNOT gates and 
7 T /T^ gates, however, we prefer this form since it has 
the largest number of gates operating on the qubits a and 
c after no more gates are being applied to the qubit b. 

Literature encounters two apparently related imple¬ 
mentations of the Toffoli gate up to a relative phase [ 1 , 
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page 183] and Q, that we summarize in one distilled pic¬ 
ture, see Figure El There are more symmetries and prop¬ 
erties to this circuit than those that necessarily meet the 
eye on the first glance. In particular, 

• Gates 1-10 implement a type-{c} special form 
relative phase Toffoli gate S‘^RTOF{a,b-, c) = 

diag |l, 1,1,1,1,1, ^ ^ 0}’ gates 2- 

10 implement a relative phase Toffoli gate 

RTOF{a, b; c) = diag |l, 1,1,1,1, -1, ° ^ |. 

• First gate, the controlled-Z, can be moved 
to the end of the circuit, resulting in the 
construction of the type-{c} special form rel¬ 
ative phase Toffoli gate S‘^RTOF{a,b-,c) = 

d*ag|l,l,l,l,l,l, Q* 

• Simultaneous substitution T ^ and ^ T 
allows constructing more circuits implementing a 
relative phase Toffoli gate. 



lustrated next. 



we would have obtained an RTOF{a, b; c). 

We found no relative phase Toffoli-4 implementa¬ 
tions in the literature, but realized that one may be 
constructed as follows. Consider circuit in Figure El 
gates 2-10. Replace CNOT{a-,c) with a type-{c} spe¬ 
cial form relative phase Toffoli gate S‘^RTOF{x,a-,c); 
this operation introduces a new qubit, x. The re¬ 
sult is an RTOF{x,a,b-c). Figure E] illustrates the 
result of such a procedure for SRTOF selection per 
Figure El (observe that the controlled-.Z was com¬ 
muted through the Hadamard gate to obtain the 
CNOT). In the matrix form, the gate looks as follows, 

diagl 1,1,1,1,1,1,1,1,1,1,1,1, z,-i, 



• The circuit given by the gates 2-10 is self-inverse. 

• Qubits a and b may be interchanged. Applying 
this operation gives modified relative phase Toffoli 
circuits. 

• Adding gates T^{a) and T^{b) (powers of the T 
gate), where m,n G {0,1,..., 7}, to both the begin¬ 
ning and the end of the circuit in Figure El allows 
constructing more relative phase Toffoli gates. 

• Consider gates 3-9. Using the CNOT-T algebra 
terminology BE, the T gate is being applied to 
{c, —(6© c), a© 60 c, —(a© c)} (negative sign indi¬ 
cates the application of Tl). Instead, we may apply 
the T gate to {c, 6©c, —(a©6©c), —(a©c)}. Then, 
the circuit we obtain looks as follows: 


- i 

1- 

-tA 


^ Tt ^ 

^ Tt - 


Observe how similar it is to [3, page 183]— 
essentially, Y rotations are replaced by Z rotations. 
Optimality of the above circuit employing Ry rota¬ 
tions in place of T (sometimes known as Margolus 
gate) was shown in [Tlj. Conjugating this circuit 
by a pair of Hadamard gates on the qubit c allows 
to obtain a relative phase Toffoli RTOF{a,b;c), 
where 6 denotes the negative control. Similarly, 
if the T/T1 gates of the circuit in Figure El gates 
3-9, were replaced with Ry{'k / Y) / Ry{—tt / 4) , as il- 


C. Results of the simplification 

Since T-count optimal and CNOT-count optimal im¬ 
plementations of the three-qubit Toffoli gate are known, 
we will concentrate on the TofFoli-4 and larger gates. This 
section is not meant to report complete results of the 
optimization that is possible to obtain (indeed, there is 
no guarantee there are no better relative phase Toffoli-4 
gates to be used, and we did not look for the relative 
phase Toffoli-5 and larger gates), rather show a clear ad¬ 
vantage of using relative phase and special form relative 
phase Toffoli gates and motivate their further in-depth 
study. 

Consider Toffoli-4 implementation via a circuit with 
Clifford+T gates. Using matrix determinant argument, 
one may establish that the Toffoli-4 may not be imple¬ 
mented unless at least one ancilla qubit is available. This 
is because the determinant of the 16 x 16 matrix rep¬ 
resenting the Toffoli-4 evaluates to the number (—1), 
whereas the determinants of all Clifford+T library gates, 
when viewed as 16 x 16 matrices, are equal to 1. By com¬ 
posing the products of matrices with determinant 1 it is 
impossible to obtain a matrix with determinant (—1). As 
a result, at least one ancilla is required. 

Once we have established that an ancilla qubit is re¬ 
quired, there are two options for the kind of ancilla qubit 
it is. One, more restrictive, prescribes that the ancilla be 
available in the state jO); the other provides the ancilla 
in some unknown state, jx). In both cases, when imple¬ 
menting Toffoli-4 with the help of an ancilla, special care 
needs to be taken to return the value of ancilla to its 
original state. We consider both cases next. 

Optimization of TofFoli-4. 
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^ Tt - 7/ ^ 
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FIG. 4: Circuit implementing Toffoli-4 up to a relative phase, RTOF{a,b,c;d). 


Ancilla |0), minimizing T count. Literature en¬ 
counters two results, 1^ and [^, both based on 
the optimization of [ 1 , page 184]. In particular, 
reports an optimized circuit with 15 T gates 
(down from unoptimized 21), and 0 observes that 
two Toffolis can be replaced with the relative phase 
Toffoli called the controlled-controlled-zX, Figure 
[pi which explains the optimization obtained in 
[lOl | . Our solution uses a somewhat simpler relative 
phase Toffoli, see Figure [31 dashed (gates 2-10), to 
obtain TOF^(a, b, c; d): 



(4) 


There is no advantage in the number of T gates. 
However, our solution explains both known circuits 
and features a smaller overall gate count. 

• Ancilla jO), minimizing CNOT count. Q uses 
controlled-controlled-iX to obtain an implementa¬ 
tion with 14 CNOTs. To our knowledge, this was 
the best known result in the literature to date. Our 
construction, o, requires only 12 = 3-I-6-I-3 
CNOT gates, since our relative phase Toffoli (Fig¬ 
ure El dashed) requires one less CNOT gate. Ob¬ 
serve, that per |13l | the lower bound for the num¬ 
ber of CNOT gates is 8. Therefore, our 12-CNOT 
construction may not be improved by more than 4 
CNOT gates. 

• Arbitrary single-qubit ancilla, minimizing T count. 
The best known solution, M , optimizes the 28 T 
gate implementation from [3|, Lemma 7.2]. The 
result is a circuit with 16 T gates. Our solu¬ 
tion matches this solution, and in fact explains 
how it works. Indeed, we obtain the desired 
TOF‘^{a, b, c; d) as follows: 



(5) 


where x is the ancilla qubit in an unknown state, 
RiTOF{a, b; x) is the relative phase Toffoli per Fig¬ 
ure El dashed; and S^R 2 TOF{x,c]d)V{c,d) pair 
is given by m -dashed. Essentially, V (c, d) is de¬ 
signed such as to undo all gates applied to the 
qubits c and d at the end of the implementation 
given by ©■ We have not found a suitable spe¬ 
cial relative phase Toffoli gate implementation that 
is different from the implementation of the Toffoli 
gate itself, per and giving a better optimiza¬ 
tion once combined with proper V{c,d). The re¬ 
sulting T-count of our construction is thus 16 = 
4 -I- (7 — 3) -I- 4 -I- (7 — 3). Apart from the matching 
number of T gates, our solution contains fewer Clif¬ 
ford gates {e.g., 14 CNOTs vs 54 CNOTs in [lol|L 
and may also be rewritten as a T-depth 4 circuit 
(T-depth 1 per each relative phase Toffoli stage) at 
the cost of a higher number of ancillae and a higher 
number of CNOT gates. 


• Arbitrary single-qubit ancilla, minimizing CNOT 
count. Using CNOT-optimal implementation of the 
controlled-controlled-iA from Q over [^, Lemma 

7.2] would yield a circuit with 20 CNOT gates, as 
is done in [T^ . The original circuit, [^, Lemma 

7.2] , uses 24 CNOT gates after each Toffoli is sub¬ 
stituted with their CNOT-optimal implementation. 
Our construction, ®, contains 14 = 34-4-1-3-1-4 
CNOT gates. 


Observe that the above implementations, if considered as 
circuits over Clifford-|-T library, use the minimal number 
of ancillae, being one. 

Optimization of Toffoli-5. 

One may once again apply the determinant argument 
to establish that the Toffoli-5 gate needs at least one 
ancilla to be available before it may be implemented as 
a Clifford-|-r circuit. 


• All ancillae in the state jO), minimizing T count. 
The best known solution is given W M via an op¬ 
timization of the construction in [j, page 184], and 
explained by to be a four controlled-controlled- 
iX and one Toffoli circuit. The T-count is 23 and 
both known solutions use two ancillae. Our solu- 
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tion implementing TOF^{a, b, c, d; e) is as follows: 



( 6 ) 


per RxTOF^ implementation found in Figure[3]and 
Toffoli implementation from ([S]) . Our solution uses 
23 = 8 + 7 + 8T gates, relies on only one ancilla, 
and has a smaller total number of gates compared 
to the previously known constructions. 

• All ancillae in the state |0), minimizing CNOT 
count. The construction from gives the best 
known CNOT count of 22 over a circuit that uses 
two ancillae. Our circuit ([H]) contains 18 CNOTs 
and uses only one ancilla. Recall that the lower 
bound for the number of CNOT gates is 10 [T^. 

• All ancillae in an unknown state, minimizin g T 
count. The best known solution is given in [l0| 
and features 28 T gates. Our solution implement¬ 
ing TOF^{a, b, c, d] e) is as follows: 



where x is the ancilla qubit in an unknown state, 
RiTOF'^ is the relative phase Toffoli from Figure 
m and S^R 2 TOF{x,d;e)V{d,e) pair is given by 
m -dashed. Observe, that the overall number of 
T gates is 24 = 8 -I- (7 — 3) -I- 8 -I- (7 — 3), we use one 
less ancilla compared to the best known construc¬ 
tion, and a smaller overall number of the non-T 
gates. 

We can furthermore explain how to obtain the so¬ 
lution with 12fc — 20 T gates to implement a k- 
controlled Toffoli gat e using k — 2 unrestricted an¬ 
cillae featured in [l^ without resorting to a com¬ 
puter optimization. This is done via the use of the 
relative phase Toffoli and Toffoli-C pair from Fig¬ 
ure [31 dashed, and m -dashed, over the construc¬ 
tion reported in Corollary |2l We illustrate how 
this works using the circuit from Figure j^Jc) and 
observe that the arguments easily generalize to ar¬ 
bitrary k. Substituting relative phase and special 
relative phase-C pair implementations into the con¬ 
struction in FigureHJc) replaces each relative phase 


Toffoli with a circuit containing 4 T gates. The to¬ 
tal number of the T gates would thus be 48 (for 
arbitrary k, 16fc — 32), higher than 40 M- How¬ 
ever, observe that R3TOF and R^^TOF are in¬ 
verses of each other. This means that the gates 
and F[ on the target qubit that the R3TOF ends 
with would cancel with Ft and T that the R^^TOF 
begins with. This cancellation happens between all 
four such pairs {R3TOF, R^^TOF} found in the 
circuit. The total reduction is thus by 8 T gates (for 
arbitrary k, 4fc — 12), leading to a circuit with 40 T 
gates (for arbitrary k, 16fc—32—4fc-l-12 = 12 fc— 20 ). 

• All ancillae in an unknown state, minimizing 

CNOT count. includes an implementation 

where the controlled-controlled-iX is used within 
[1, Lemma 7.2] for all but two gates. This con¬ 
struction relies on 36 CNOT gates. For arbitrary 
n, the CNOT count is 16n—44, which we further re¬ 
fer to as cc-iX implementation in Table [H Observe 
that reports an implementation with 26 two- 
qubit gates using two ancillae. The optimization in 
|l4| is motivated by a computational model where 
the two-qubit interaction given by diag{I,X^^}, 
where t € M[0,1] and X is Pauli-X, is tunable 
and parametrized by time. Therefore, for exam¬ 
ple, a controlled-VXOT would cost half as much 
as the CNOT, as it only needs to be evolved for 
half the time. In our calculations given here, we do 
not allow such things to happen, but observe that 
it would be interesting to apply the reported rela¬ 
tive phase Toffoli constructions within that frame¬ 
work. Controlled-V NOT may be implemented as 
a 2-CNOT circuit [3, Figure 4.6]. The 26 two-qubit 
gate circuit of [3 has 18 controlled-V NOT gates 
and 8 CNOT gates, therefore it would be trans¬ 
formed into one with 44 CNOT gates. Note, how¬ 
ever, that it would make little sense from the point 
of view of the computational model considered in 
p^ . as a length-0.5 interaction is being replaced 
with a length -2 interaction. 

In comparison, our solution, given by © , is a cir¬ 
cuit with 20 (= 6 -I- 4 -I- 6 -I- 4) CNOT gates that 
uses only one ancilla—latter being provably opti¬ 
mal within the framework of Clifford-t-T circuits. 

We generalize the above examples of Toffoli-4 and 
Toffoli-5 optimization to any number of qubits in the fol¬ 
lowing two Propositions. 

Proposition 4. A size n > 4 multiple control Toffoli gate 
TOF'^ may be implemented using ancillary qubits, 

set to and returned in the value | 0 ), by a circuit with: 

• 8 n — 17 r gates, 

• 6 n — 12 CNOT gates, and 

• 4n — 10 Hadamard gates. 
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Proof: The proof is by induction. The statement is 
clearly true for n = 4 and n = 5, as has been explicitly 
verified in the previous discussions. To prove the transi¬ 
tion from an even n = 2 fc to the odd n = 2k + 1 observe 
that the middle gate TOF^ can be replaced with the cir¬ 
cuit (jH). This introduces an RTOF^, Figure [31 dashed, 
and its inverse. Note that a new ancillary qubit is being 
introduced on this step, and the gate counts increase by 
8 = 4 + 4 for T, by 6 = 3-f 3 for CNOT, and by 4 = 2-^2 
for Hadamard. The transition from an odd n = 2k + 1 
to the even n = 2fc + 2 is accomplished via replacing 
RTOF^ with RTOF'^, Figure [H and its inverse with the 
inverse of RTOF'^. Observe that the gate counts grow 
by 8/6/4 for T /CNOT/Hadamard, but no new ancilla is 
being introduced. □ 

Note that 0 reports a circuit with n — 3 |0) ancil- 
lae, 8 n — 17 T gates, 8 n — 18 CNOT gates, and 4n — 10 
Hadamard gates. 

Proposition 5. A size n > 5 multiple control Toffoli gate 
TOF^ may be implemented by a circuit using an¬ 

cillary qubits residing in an arbitrary state and returned 
unchanged, by a circuit with: 

• 8 n — 16 T gates, 

• 8 n — 20 CNOT gates, and 

• 4n — 10 Hadamard gates. 

Proof: To assist with proving this Proposition, define 
the following gates: 

1. i?TL(a, 6 , c) per FigurejSl dashed. This is a relative 
phase Toffoli gate. The implementation contains 
9 elementary gates: 4 T gates, 3 CNOTs, and 2 
Hadamards. 

2. RTS{a,b,c) per Figured gates 2-6. This is a rela¬ 
tive phase Toffoli followed by a H {b, c) that removes 
the last four gates. The circuit contains 5 elemen¬ 
tary gates: 2 T gates, 2 CNOTs, and 1 Hadamard. 

3. SRTS{a, b, c) per circuit ([3]), dashed. This is a Tof¬ 
foli gate (as such it is also a type-{&} special form 
relative phase Toffoli) followed by a V{a, c) that re¬ 
moves last six gates. SRTS contains 9 elementary 
gates: 4 T gates, 4 CNOTs, and 1 Hadamard. 

4. RT4L{a, b, c, d) per FigurejH This is a 4-qubit rela¬ 
tive phase Toffoli. It contains 8 T gates, 6 CNOTs, 
and 4 Hadamards. 

5. RT4S{a,b,c,d) per Figure HI dashed. This is a 
relative phase Toffoli-4 RT4L{a, b, c, d) followed by 
a V ( 6 , c, d) that removes last 8 gates. It is com¬ 
posed of the following elementary gates: 4 T gates, 
4 CNOTs, and 2 Hadamards. 

We first prove the Proposition for the resource count of 
n — 3 ancillae, 8 n — 16 T gates, 8 n — 18 CNOT gates, 
and 4n — 10 Hadamard gates, and then introduce the 


RT4L/RT4S gates that further improve the ancilla and 
CNOT count. The proof relies on the construction found 
in FigureHKc). Assuming qubits are numbered 1 to 2n—3 
and we are attempting to implement rOF”(l, 2 , — 

1; 2n — 3), select the gates in Figure |D(c) as follows: 

1. First gate is SRTS{n — l,2n — 4, 2n — 3). 

2. Next k = l..n — 4 gates are RTS(2n — 4—k,n — 
1 — A:, 2n — 3 — fc). 

3. Next gate is i?TL(l,2,n). 

4. Next k = l..n — 4 gates are RTS~^{n — 1 + k,k + 
2,n + k) (inverses of the gates in item |3| read in 
reverse order). 

5. Next gate is SRTS~^{n—l,2n—4,2n—‘i) (this is 
the matching inverse pair for the gate in item [T]). 

6 . Next k = l..n — 4 gates are RTS{2n — 4—k,n — 

1 — k,2n — 3 — k) (same as item| 2 ]). 

7. Next gate is RTL~^{l,2,n) (this is the matching 
inverse for the gate in item o. 

8 . Last k = l..n — 4 gates are RTS~^{n — 1 + k, k + 

2, n + k) (same as item|l|). 

Observe that the desired preliminary gate counts are sat¬ 
isfied. Next step is introducing RT4L/RT4S gates to 
replace as many RTF and RTS as possible. 

1. First, replace the circuit RTS{n,3,n + 
l)RTL(l,2,n)RTS~^{n,3,n + 1) (last gate 
in item HI the gate in item [31 and first 
gate in item HI) with i?r4L(l, 2,3, n -I- 1) and 
RTS{n, 3,n + l)RTL-\l, 2, n) RTS-\n, 3,n + l) 
(last gate in item HI the gate in item [71 and first 
gate in item [5]) with i?T4L“^(l, 2, 3, n -I- 1). Note 
that this procedure may only apply for n > 5. 
It furthermore reduces the CNOT count from 
7 = 2-|-3-|-2to6 twice, for a total saving of 

2 CNOTs. Finally, observe that the qubit n is 
no more used. Thus, we save one ancillary qubit 
worth of computational space. 

2. For k = l..|’^ 2 ^'| we introduce four RT4S gates 
by replacing a pair of neighbouring RTS on the 
left and right hand sides of the previous step. In 
particular, we replace RTS{n + 2k, 2k + 3,n + 2k + 
l)RTS{n -1-1- 2k, 2k + 2,n + 2k) (item H|) with 
RT4Sin - 1 + 2k,2k + 2,2k + 3,n + 2k + 1) and 
RTS-^{n-l+2k, 2k+2, n+2k)RTS-\n+2k, 2k+ 

3, n + 2k + l) (itemHl) with i?T4S'“^(n — l-|-2fc, 2fc-|- 
2, 2fc + 3, n + 2fc -I- 1), and similarly in the second 
half of the circuit (items HI [5]). Observe that this 
operation does not change the gate counts, but frees 
up qubit n + 2k that is no more used, providing a 
reduction of one ancilla. 
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The total reductions from the above construction are a 
pair of CNOT gates, and qubits, leading to the 

resource counts as announced in the statement of the 
Proposition. 

Looking at the following circuit helps visualize all re¬ 
placements and gate counts: 



T3C 4222322242223222 
r4C 446 4446 4 

iLl 111211111112111 

In the above, dashed gates are replaced with 
i?T4L(l, 2,3, 8) and its inverse, freeing qubit 7, and dot¬ 
ted gates are replaced with i?T45'(8,4, 5,10) and its in¬ 
verse, freeing qubit 9. Line starting with “T” reports the 
T count, line starting with “T3C'” reports the CNOT 
count when only RTOF^ are being used, line starting 
with “TdC” reports the CNOT count when RTOF^ 
are allowed, and line starting with “iL” reports the 
Hadamard gate count. □ 

We summarize the results in Table [J and compare them 
against best known. The names of the columns are self- 
explanatory. Observe that [l^ features multiple control 
Toffoli implementations using 12n — 34 two-qubit gates 
over a circuit with n — 3 ancillae. In comparison, our 
implementation uses 8n — 20 CNOT gates over a circuit 
with only ancillae. It is furthermore interesting 

to highlight that in terms of implementing a multiple 
control Toffoli gate the cost of moving away from using 
unrestricted ancillae to ancillae residing in the state |0) 
is only one T gate, but in terms of the CNOTs, it is a 
noticeable term, 2n — 8. 

V. OPEN PROBLEMS 

The problem of systematically synthesizing and 
analyzing multiple control relative phase Toffoli 
implementations—both unrestricted as well as the 
special form, is important to address next. The results 
of such a search could be used directly to optimize 


implementations of the multiple control Toffoli gates, 
arithmetic parts of quantum algorithms, and reversible 
circuits. 

How efficient may a relative phase multiple control Tof¬ 
foli gate implementation be? In the 3-qubit case the an¬ 
swer is: it requires at least 3 CNOTs as a circuit over 
CNOT and any single-qubit gates library, as otherwise, 
per Corollary [U we would come to a contradiction with 
any lower CNOT gate count [l^. If it is established that 
the Toffoli gate requires 7 T gates in the presence of an¬ 
cillae, a similar argument can be applied towards showing 
that any relative phase Toffoli gate requires at least 4 T 
gates as a circuit over Clifford-t-T library. 

The reported constructions obtain best solutions si¬ 
multaneously for two circuit cost metrics arising from dif¬ 
ferent considerations, the CNOT-count and the T-count. 
It may be that this is not a coincidence. Is there a rela¬ 
tion between these two resource counts? 


VI. CONCLUSION 

In this paper, we reported an approach for systematic 
optimization of quantum circuits via replacing suitable 
pairs of the multiple control Toffoli gates with their rela¬ 
tive phase implementations. This operation preserves the 
functional correctness. However, since the relative phase 
Toffolis are easier to implement than their regular coun¬ 
terparts, the advantage can be witnessed through the op¬ 
timized resource counts. We have furthermore illustrated 
the advantage via optimizing and, when applicable, ex¬ 
plaining the nature of best known implementations of 
the multiple control Toffoli gates. Our demonstrated op¬ 
timizations include a simultaneous optimization of the 
T count by a factor of | in the leading constant, the 
CNOT count by a factor 2 in the leading constant, and 
the number of ancillary qubits by a factor of 2 in the lead¬ 
ing constant. The above refers to the optimization of the 
circuit implementing the multiple control Toffoli gate us¬ 
ing arbitrary ancillae, whose construction resulted from 
employing the relative phase Toffoli gates. 
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Gate 

Source 

Optimization goal 

#T 

# CNOT 

#H 

ffY’/Tiff ancillae Ancillae type 

TOF^ 

[10] 

T 

15 

35 

6 

3 

1 

|0) 


m 

T 

15 

14 

6 

0 

1 

|0) 


Ours 

T, GNOT 

15 

12 

6 

0 

1 

|0) 


[10] 

T 

16 

54 

6 

6 

1 

\x) 


cc-iX [12] 

CNOT 

22 

20 

8 

0 

1 

\x) 


Ours 

T, GNOT 

16 

14 

6 

0 

1 

a;} 

TOW- 

[10] 

T 

23 

63 

10 

6 

2 

|00) 


m 

T 

23 

22 

10 

0 

2 

|00) 


Ours 

T, GNOT 

23 

18 

10 

0 

1 

|0) 


m 

T 

28 

90 

10 

13 

2 

\xx) 


cc-iX [12] 

CNOT 

38 

36 

16 

0 

2 

\xx) 


Ours 

T, CNOT 

24 

20 

10 

0 

1 

\x) 

TOF^ 

m 

T 

31 

94 

14 

9 

3 

]000) 


m 

T 

31 

30 

14 

0 

3 

000) 


Ours 

T, CNOT 

31 

24 

14 

0 

2 

|00) 


[10] 

T 

40 

132 

14 

20 

3 

\xxx) 


cc-iX [12] 

CNOT 

46 

52 

24 

0 

3 

\xxx) 


Ours 

T, CNOT 

32 

28 

14 

0 

2 

\xx) 


[10] 

T 

71 

232 

34 

24 

8 

[00000000) 


m 

T 

71 

70 

34 

0 

8 

00000000) 


Ours 

T, CNOT 

71 

54 

34 

0 

4 

]0000) 


m 

T 

100 

328 

34 

55 

8 

\xxxxxxxx) 


cc-iX [12] 

CNOT 

134 

132 

64 

0 

8 

\xxxxxxxx) 


Ours 

T, CNOT 

72 

68 

34 

0 

4 

\xxxx) 

TOF",n> 5 

m 

T 

8n-17 

N/A 

N/A 

N/A 

n-3 

100...0) 


m 

T 

8n-17 

8n-18 

4n-10 

0 

n-3 

00...0) 


Ours 

T, CNOT 

8n-17 

6n-12 

4n-10 

0 

r^i 

00...0) 


[10] 

T 

12n-32 

N/A 

N/A 

N/A 

n-3 

\xx...x) 


cc-iX [12] 

CNOT 

16n-42 

16n-44 

8n-24 

0 

n-3 

\xx...x) 


Ours 

T, CNOT 

8n-16 

8n-20 

4n-10 

0 

r^i 

\xx...x) 


TABLE I: Optimization of the multiple control Toffoli gates using RTOF^ and RTOF^ gates. 
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